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ABSTRACT 

I propose a new procedure to estimate the False Alarm Probability, the measure of 
significance for peaks of periodograms. The key element of the new procedure is the 
use of generalized extreme- value distributions, the limiting distribution for maxima of 
variables from most continuous distributions. This technique allows reliable extrapo- 
lation to the very high probability levels required by multiple hypothesis testing, and 
enables the derivation of confidence intervals of the estimated levels. The estimates are 
stable against deviations from distributional assumptions, which are otherwise usu- 
ally made cither about the observations themselves or about the theoretical univariate 
distribution of the periodogram. The quality and the performance of the procedure is 
demonstrated on simulations and on two multimode variable stars from Sloan Digital 
Sky Survey Stripe 82. 
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1 INTRODUCTION 

Several research fields of astronomy rely crucially on the 
analysis of periodic phenomena. For instance, asteroseismol- 
ogy depends on a reliable identification of excited oscillation 
modes in variable stars, in order to match theoretical stel- 
lar models to observed data ( Smolec fc Moskalik|2007 



Gri- 



gahcene et al.|[2010[ |Balona fc Dz icmbowski 20 1 1| |Antoci 



et al. 



2011 



to mention only a few). As another example, 
extrasolar planetary systems are found by periodic modula- 
tions in the light or the radial velocity of their central stars 



( Cumming et al.|1999| |Udry et al.||2007| |Mayor et al.||2009 



Dawson fc Fabrycky|2010 1. Thus, the detection and analysis 
of periodic signals in astronomical time series receives much 
attention in the literature (for a concise summary of the 
theoretical background, see Schwarzenberg-Czerny||1998 1. 



The general principles of the detection are those of sta- 
tistical hypothesis testing and model selection. A null hy- 
pothesis Ho of no periodic signal in the observed time series 
is tested against the alternative of the presence of a periodic 
deterministic component. Ho supposes most often that the 
observations are white noise (eventually Gaussian) around 
a constant mean, though the absence of periodicities do not 
exclude eventually other error distributions, errors with non- 
trivial time series structure and non-periodic stochastic pro- 
cesses like random walks. The white noise null hypothesis is 
made plausible by pre-processing the data: pre-selection of 
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candidates (e.g. quasar/star separation in a deep survey to 
filter out cases where random walks may be a good model in- 
stead of periodic light variations), and a careful assessment 
whether independence or uncorrelatedness is a sufficiently 
good approximation for the errors. The alternative hypoth- 
esis is formalised by a series of more complex models M / 
indexed by the frequency /, consisting of a periodic signal at 
/ and noise. Both the null model Mo and the collection M / 
are fitted to the data, and a test statistic 6(f) is computed 
for all models, which quantifies the improvement yielded by 
Mf over .Mo- This collection of the test statistics as a func- 
tion of / is called the periodogram. The frequency at which 
the largest improvement is achieved is accepted as the most 
likely frequency of an eventual periodic component. Then 
the decision, whether the object shows a periodic oscillation 
or not, is based on the significance assessment of the model 
improvement: the probability is computed that the realized 
periodogram maximum or a higher value is produced un- 
der the null hypothesis. This probability is termed the False 
Alarm Probability (FAP). 

This assessment is a multiple testing situation: as we do 
not know the frequency of the sought oscillation in advance, 
we compute the test statistic often at hundreds of thousands 
of frequencies. Thus, the single- value distribution F (the 
marginal distribution) of the individual test statistics is not 
directly applicable to compute the FAP. Instead, we must 
find the distribution G of the maximum of a large set of test 
statistics. 

The idea which is most commonly used in astronomy 
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to obtain G and the FAP is based on elementary proba- 
bility calculations for the maxima of M independent vari- 
ables with common distr ibution funct i on F, yielding the 



formula G(z) = F{z) M (Scargle 



1982 



1986 Schwarzenberg-Czerny 1998). This formula has the 



Home & Baliunas 



great merits of being simple, quick to compute and easily 
applicable even for large surveys in an automated way. 

However, there are numerous issues with it, both on the 
theoretical side and in practice. Astronomical time series are 
typically irregularly spaced, sparsely sampled in time do- 
main, and the periodogram is computed on an oversampled 
frequency grid. As a result, there is no set of independent or 
uncorrelated frequencies in the astronomical periodogram, 
M loses its direct meaning, and must be interpreted as an 
effective number of hypothetical independent frequencies. 
Thus, the formula itself becomes an ad hoc statistical model 
rather than a well-founded approximation. On the practical 
side, the form F(z) M is highly unstable, sensitive to mis- 
estimation of both of its components, and is not adapted 
for statistical inference. A more reliable FAP estimate, with 
sound statistical foundations and with a possibility to assess 
the uncertainty in the FAP level estimate would be desirable. 

This paper proposes to avoid the shortcomings of the 
formula F(z) M by the application of extreme-value statis- 
tics. Statistical theory proved that maxima of random vari- 
ables follow a simple 3-parameter limiting distribution, 
called the generalized extreme-value family (GEV; Fisher 
fc Tippett|1928||Gnedenko|1943[). The validity range of this 



limit theorem is very broad: it encompasses practically all 
continuous distributions and dependent variables too, under 
some mild conditions (Leadbetter 19741. Regardless of the 



underlying marginal distribution of the variables, it has one 
common parametric form, estimable with standard methods, 
and provides a stable, mathematically well-founded model 
for extrapolation to the levels required by the FAP. Instead 
of the unstable formula F(z) M , the use of the GEV is stan- 
dard practice in most sciences, industries, finance or econ- 
omy that are concerned with risk estimation of rare events 
like hurricanes, floods, droughts, internet traffic failures, or 



market crashes (for a range of applications, see e.g. Finken- 



stadt & Rootzen 20011. A first use of extreme- value the- 



ory for periodograms in astronomy was proposed by |Baluev| 
( 2008 1, providing an upper bound on FAP under some strict 



conditions on the distribution of the periodogram and under 
a low level of aliasing and spectral leakage. 

The procedure proposed here fits a GEV model to the 
tail of the distribution of the periodogram under the null hy- 
pothesis of the time series, that is, it is white noise. Then this 
is used to find either the FAP of an observed periodogram 
peak, or critical levels corresponding to pre-specified FAP 
levels. First, we construct a large number of noise time series 
corresponding to the null hypothesis. In the next step, we 
compute part of the periodograms of the constructed time 
series, and find the maxima of these partial periodograms. 
Finally, the parameters of the GEV distribution of these 
maxima are estimated, and the obtained GEV is used to 
extrapolate to the desired high levels. 

In Section [2j we first give a brief summary about fre- 
quency analysis and its particularities in astronomy, and dis- 
cuss in detail the consequent problems in the statistical hy- 
pothesis testing for the significance of periodogram maxima. 
Then we present the basics of extreme-value theory and in- 



ference, and discuss its potential and difficulties. Section [3] 
describes the proposed procedure, and explains the argu- 
ments motivating the applied techniques. Section [4] shows 
the performance of the procedure on simulations, while Sec- 
tion [5] applies the methods to two variable stars from Stripe 
82 of the Sloan Digital Sky Survey with an RR Lyrae-like 
primary frequency. Section[6]gives a summary of the results. 



2 STATISTICAL BACKGROUND 
2.1 Frequency analysis 

Suppose we have an observed time series Xi, . . . , Xn with 
N points, e.g. magnitudes or radial velocities of a star, mea- 
sured at times ti, . . . , tj\r. The exposure time and other prac- 
tical factors limit the precision of the epochs, and there can 
be found a greatest common divisor St of the time differences 
ti — tj, i,j — 1,...,N, such that we can regard the mea- 
surements as taken on a dense time grid of resolution 8t, 
consisting of epochs 0, ISt, 25t, . . . , TSt (Eyer & Bartholdi 



19991. The observed sequence can then be regarded as a 



time series taken on a regular grid, with a scarce minority 
of known observations at t\, . . . , tjv and with an overwhelm- 
ing majority of missing values at other epochs of the grid. In 
the sequel, the term 'irregularly or unevenly sampled time 
series' will refer to such an observational sequence. 

Periodic signals in an evenly sampled time series 
X\ , . . . , Xt are usually detected by the means of the pe- 
riodogram, an estimate of the spectral density of the time 
series, on a fixed frequency grid T between and / max , where 
/max < /Nyquist = l/2<5t ( |Eyer fc Bartholdi| 1999} . The clas- 
sical periodogram is defined as 



lT,x(f) 



1 

TSt 



i2irfjSt 



x, 



(1) 



For evenly observed signals, the fundamental method 
to estimate the periodogram at the Fourier frequencies is 
the fast Fourier transform. For the irregularly sampled case, 
many methods can be found in the astronomical litera- 



ture: the Deeming method ( Deeming 1975 \ that can be 
regarded as the direct generalization o f (|1 1 to arbitrary 
times; PDM-Jurkevich (|Jurkevich||1971| |Stellingwerf||1978 



Dupuy fc Hoffman|1985[); string length (|Clarke|2002[ ); Super 
Smoother (|Friedman 1984; Rcimann 1994); Keplerian peri- 
odograms ( jCumming 2004|; Lomb-Scargle and its extension, 



the generalized Lomb-Scargle method ( Lombp976 



p82| 



Zechmeister fc Kiirster|2009[ ); the FastChi2 of 



Scargle 



Palmer 



(12009 1; and for photon arrival time series, methods based on 
Rayleigh's and Kuiper's tests (Paltani 20041. The extremum 
(most often the maximum) of the estimated periodogram in- 
dicates the most likely frequency of the object, and statisti- 
cal hypothesis testing is used to decide whether the periodic 
component is significant or not. 

This hinges on the knowledge of the distribution G of 
the maximum. Most arguments for its derivation rely di- 
rectly or indirectly on the relationship F(z) M , which fol- 



lows from independency assumptions ( jScarg le 1982; Home 
|fc Baliunas||1986[ ). If there is a set of independent random 
variables Z\ , . . . , Zm with common distribution function F, 
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then the distribution of their maximum can be derived as 



Pr(max{Zi, . . . , Zm} < z) 

= Pr [Zx z, . . . , Z M < z) 

= Pr [Z\ < z) x . . . x Pr {Z M < z) 



(2) 



where the second equality is true only if the variables 
Zi, . . . , Zm are independent; otherwise, the joint probabil- 
ity cannot be decomposed into a product of the marginal 
probabilities. 

So far, few easy-to-use alternatives were proposed to 
this formula, and most procedures for FAP estimation rely 
on it (though there are bootstrap-based alternatives as in 
Paltani 2004 and Schwarzcnbcrg-Czcrny 2012, or procedures 
using empirically derived reference distributions as in |Koen| 
fc Eyer|1999 |. However, there are several drawbacks. 



• No independent set of frequencies exists for an irregu- 
lar sparse time sampling. The irregular sampling introduces 
non-vanishing correlations between sine functions of differ- 
ent frequencies, so the system of mutually orthogonal fre- 
quencies disappears. This entails the loss of any independent 
frequency systems even in a Gaussian case. Thus, an appro- 
priate derivation of the distribution of the maximum should 
use the joint multivariate distribution of the periodogram. 

• This joint multivariate distribution is degenerate. We 
use JV observations to compute n ^> N test statistic values 
8(fi) (i = 1, . . . ,n). Any mapping h : R N — ► R n , defined 
on the space of the observed random variables Xj, . . . , Jfjv 
to compute n values Zi, . . . , Z n , produces degenerate joint 
probability distributions if n > N (N/2 for periodograms, as 
we lose the phase information during the computation). The 
testing situation is therefore not just mildly, but radically 
different from the basic assumptions of Q. 

• The marginal distribution F(z) is usually unknown. 
Theoretically derived approximate marginal distributions 
for the periodogram are usually based on the asymptotic 
theory of least squares estimation or the Central Limit The- 
orem, using the orthogonality of the basis functions. How- 
ever, in many cases orthogonality does not hold. An example 
is the inclusion of a constant beside sin 2nft and cos 2-7r/t in 
the generalized Lomb-Scargle method. The latter two func- 
tions, providing the basis for Fourier analysis and the (non- 
generalized, zero-mean) least squares method, can be made 



orthogonal for irregular sampling by a phase shift (Scargle 



1982). Depending on normalization and the test statistic 



used, the marginal distribution of the periodogram can then 
be shown to be either the \ 2 or the Fisher-Snedecor or the 
beta distribution (the first is for known spectrum normal- 
ization, the second two are for empirically estimated noise 
levels depending on test statistic; see |Sch warzcnbcrg- Czerriy| 



1998). The generalized Lomb-Scargle method adds the con- 
stant function f(i) = 1 to these two basis functions, allow- 
ing for the presence of an unrestricted floating mean at all 
frequencies. This extended basis is not orthogonal. Orthog- 
onalization would require a time-consuming Gram-Schmidt 



procedure ( |Ferraz-Mello 1981 1 that is rarely done in prac- 
tice. Consequently, the % > the Fisher-Snedecor or the beta 
distribution can be accepted only as approximate models for 
the periodogram distribution. 

• Theoretical approximations may be anyway rough in 
some cases: if there are only a small number of observations 
and strong departures from normality in their tail. Approx- 




4 6 
Time[days] 



2 12 
Theoretical 



Figure 1. A simulated time series (sinusoidal signal and in- 
dependent Gaussian errors with SNR = 0.5 and frequency 
3.379865 d~ 1 ) and its quantile-quantile plot. The signal is un- 
detectable in this case. 



imate marginal distributions for the periodogram in such 
cases can be poor, as they are only asymptotically valid 
when the observations themselves are not Gaussian. Such 
a case is illustrated in Figure [I] the standardized Gaussian 
quantile-quantile plot (right panel) of the observed sequence 
(shown on the left) shows strong deviations from the straight 
line representing a true Gaussian distribution, especially in 
the high end. 

• It is necessary to estimate M and F(z). M does not cor- 
respond to the count of any interpretable or identifiable vari- 
able in a simplified model that could be straightforwardly 
derived from the testing situation, so its estimation is usu- 
ally based on fits to simulations including strong assump- 
tions. The estimated value of M is most often larger than the 
number of observations (|Horne fc~B aliunas 1986, Frescura et| 



al. 2008 Schwarzenberg-Czerny 20121, which is impossible 



in the context of an independent model. This hints at the 
ad hoc nature of the approximation F(z) M : the absence of 
independence and the degeneracy in the periodogram makes 
the simplification to an equivalent independent system hard 
to imagine. 

• The formula F(z) M is very sensitive to tiny changes in 
both M and F . Extremely high quantiles of F (of the order 
of F(z) — 0.9999) must be precisely estimated in order to 
obtain even moderate FAP levels: even with M = 25 (an 
unusually low value) and FAP= 0.01, we need z such that 
F{z) = 0.9996, since FAP = 1 - F(z) M = 1 -0.99 96 25 . Nei- 
ther theoretical nor estimated marginal distribution func- 
tions F are reliable in this range. Theoretical approxima- 
tions are usually tailored to the center of the distribution, 
while estimations practically never have data at such ex- 
treme levels, so extrapolation conveys information about our 
preliminary assumptions rather than about the true distri- 
bution at the high end. The form F(z) M , with M in the 
exponent of a highly uncertain tail distribution, can easily 
result in an erroneous estimation. 

• Moreover, the distribution F(z) M itself becomes degen- 
erate when M increases. This causes a situation contradict- 
ing a fundamental paradigm in statistical inference, namely, 
that the distribution of the test statistic should tend to a 
stable well-defined distribution when the number of observa- 
tions N increase, and in general, the quality of the estimates 
should improve. However, in the analysis of astronomical 
periodograms the increase of TV usually causes also M to 
increase, and this creates a catastrophic situation for infer- 
ence, for the following reason. Let z+ denote the endpoint 
of the marginal distribution F(z) of the periodogram. This 
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endpoint can be finite for a finite-domain distribution such 
as the uniform or the beta distribution, or oo if it has no 
finite upper limit, such as the Fisher-Snedecor or the \ 2 dis- 
tribution. Since F(z) < 1 for all z < z+, and F(z) = 1 only 
for z — z+, F(z) M tends to for all z < z+, and to 1 only 
for z — z + . Thus, when z + is finite, F(z) M tends to a step 
function with an extremely steep rising part just before z+, 
and to give a good probability estimate, we need a precision 
in z tending to infinity. When z + is infinite, F(z) — > 
everywhere on the real line, and z values corresponding to 
the necessary near-one probability levels run out to infinity. 
This instability of F(z) M implies that no stable limit dis- 
tribution can be given for the estimated FAP levels, and no 
regular statistical inference can be obtained for the quan- 
tile estimates. Stabilization is necessary, and this is what is 
obtained by extreme-value theory. 



2.2 Generalized extreme-value distributions 

Extreme-value theory deals with the statistical analysis of 
low-probability events. Its development has been motivated 
by the need of estimating the probability and the level of 
rare, but possibly catastrophic events, such as the hurricane 
Katrina in 2005; the 3-day rainstorm on December 14-16, 
1999, which caused around 30,000 deaths in the coastal ar- 
eas of Venezuela; or financial market crashes that can have 
serious impact on economic stability. The theory has been 
summarized in several books (e.g. 



Leadbetter et al. 



1983 



Resnick||T987| |Embrechts et al.||1997| |Coles||2001| |Beirlant| 



et al.|2004||de Haan fc Ferreira|2006| ). Its cornerstone theo- 
rem yields a family of asymptotically valid limiting distribu- 
tions for the (normalized) maxima Z max ,n of a large number 
of random variables Z\, . . . , Z n , identically distributed ac- 
cording to a continuous distribution F(z), when n — ► oo 
(the normalization constants are irrelevant for estimation, 
as they are merged into the parameters of the distribution). 
The theorem is valid for the maxima of not only indepen- 
dent, but dependent variables too, if the dependence de- 
cays between increasingly separated extremely large vari- 
ables when n — ► oo. The extremal types theorem (Fisher & 
Tippett[l9 28 Gncdcnko 19431 states the form of this limit- 



ing family, called the generalized extreme- value distribution 
(GEV): 



G(z) 



exp 



1 + * 



z — \1 



(3) 



a > 0, 



where z is such that 1 + £(z — fj)/cr > 0. The exact formu- 
lation of the extremal types theorem can be found in the 
references given above. 

The GEV family constitutes the limiting family for the 
maxima of nearly all continuous distributions. The parame- 
ter £, called shape, is related to the tail decay of the under- 
lying distribution of F(z), and divides the GEV family into 
three well-separated subfamilies. Negative shape parameters 
provide distributions of maxima of variables with a finite up- 
per boundary, for example the uniform or the beta distribu- 
tion. In this case, z < fj, — £a, and there is zero probability to 
obtain maximum values higher than this limit. With a posi- 
tive shape parameter, there is no upper limit, the probabili- 



ties in the right tail of the GEV decay slowly as a power law, 
and there is considerable chance for the maximum to reach 
very large values. This is the limit distribution of maxima 
from heavy-tailed laws like the Student's t, or the Cauchy 
distribution, which has the same mathematical form as a 
Lorentzian profile. The case £ = 0, called the Gumbel distri- 
bution, separates these two types of distinct behaviour. It is 
defined as the limit function when £ — > 0, and has the form 

G(z)=exp{-exp(-^)} 

with z £ R. Its tail decreases exponentially, giving much 
lower probabilities to observe very high maxima than the 
case £ > 0, but these probabilities are nowhere 0, differently 
from the case of a negative shape parameter. This is the 
limit distribution of maxima of variables from distributions 
like the normal (Gaussian), lognormal, gamma, exponential 
or the chi-squared. 



2.3 Inference and diagnostics for extremes 

The estimation of a GEV model for a time series of length 
n starts usually with the division of the series into blocks 
of k observations (for example, a block can be a year for a 
sequence of daily temperature measurements spanning sev- 
eral decades). From each block, we select the maximal value 
(in the example, the maximum temperature of each year, 
say zi, . . . ,z m ), and we fit the GEV model to the maxima 
for example by maximum likelihood (see e.g. Coles 2001 \ or 
probability- weighted moment method ( jHosking et al.|1985[ ). 
The log-likelihood, which is derived from the cumulative dis- 
tribution function (J3j) by first differentiating it with respect 
to z, then computing the likelihood of the set of maxima 



Z\ , . . . , Zn 
form: 



finally taking its logarithm, takes the following 



mloga- (l + -)^log(l + £ 



■I 



Zi — (J, 



-i/€ 



(4) 



under the constraint that l+£(zj— /x)/er > for i = 1, . . . , m. 
This function is maximized with respect to its arguments 
£, a and /j, to obtain the best-fit estimates for the parame- 
ters. The estimates are asymptotically normal if £ > —0.5 



(Smith 19851, so variance-covariance matrix and standard 



errors can be straightforwardly derived for the estimates. We 
first compute the negated second derivative matrix — gg g l e 
of the likelihood function, where the parameter vector 9 de- 
notes 6 = (£, a, fi). We evaluate this matrix at the estimated 
parameter values; this is called the observed information ma- 
trix. Then the variance-covariance matrix of the estimates 
can be obtained by inverting the observed information ma- 
trix; the diagonal elements give the variance of the corre- 
sponding parameter estimate, off-diagonal elements provide 
the covariance between two estimated parameters. 

The estimated GEV distribution can then be used to 
obtain return levels, probabilities of very high levels or dis- 
tributions of maxima of longer periods. The return level ( p 
associated with the return period 1/p is the level that is ex- 
pected to be exceeded once in every 1/p blocks of the same 
length k (continuing the previous example, the C1/20 return 
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level would be the temperature which is exceeded only once 
in every 20 years). This is simply the 1 — p quantile of the 
GEV distribution: 



(1-P) = A»-| [l-{-log(l-p)}-«] , (5) 



where G _1 denotes the inverse function of G. Confidence in- 
tervals for C, a can be obtained by several methods: by para- 
metric or nonparametric bootstrap, by the delta method us- 
ing the variance-covariance matrix of the GEV parameters, 
or by profile likelihood. About the details of all these proce- 
dures, and in general about GEV modelling and inference for 
extremes, Coles ( 2001 I gives an excellent practical summary 



in its Chapter 2 and Chapter 3. 

The selection of k, the number of observations in a block 
raises a question of bias- variance trade-off. Since the GEV is 
a limit distribution when k — > oo, k must be large enough to 
provide a good extreme-value approximation, otherwise the 
model will be poor, and yields biased parameter estimation 
and extrapolation. But if k is too large, we can obtain only 
a few blocks in the series and therefore too few maxima. 
This implies a large variance of the parameter estimates. 
The choice of the block size is governed by pragmatic con- 
siderations, as in the case of annual maxima for climatic 
time series. In cases where the choice is not straightforward, 
several block size may be used to perform the analysis. Then 
the quality of each model is checked by diagnostic plots, and 
the results using the smallest block size that gives acceptable 
model diagnostics can be accepted as final results. 

There are two important and easy-to-use types of diag- 
nostic plots. The first is the quantile-quantile plot, generally 
used in statistics to check adequacy of a fitted model in- 
stead of histograms, since histograms are sensitive to bin 
size choice and not adapted to detect discrepancies in the 
tails of the fitted distributions, exactly where FAP estimates 
are expected to be precise. Let zi, . . . , z m be a collection of 
block maxima, zn\, . . . , Zr m ) the ordered sample in increas- 
ing order, and G(z) the estimated GEV distribution. The 
quantile-quantile plot consists of the points 



G" 1 



Hi) 



If the model is good, the points should closely follow a 
straight line with intersect and slope 1 without strong sys- 
tematic deviations. This plot is a direct visual comparison 
of the empirical distribution function (EDF) of the data and 
the fitted model. The EDF is defined such that it takes the 
value ^ at the points {z^}; in order to avoid to have ex- 
actly 1 outside the data range, which could cause problems, 
we apply a small correction by using m + 1 in the denom- 
inator. The same probability levels are taken by the 

fitted model at G -1 f ^pjj ■ K the model is good, the EDF 
and G(z) should be close to each other. Thus, the values 
G _1 ( m l +1 ^ and should also be nearly equal, because 
they are the quantiles belonging to the same probability 
levels — Vr in the two distributions. 

m+l 

The top row of Figure[2]shows two examples of quantile- 
quantile plots resulting from GEV fits. Both plots show ac- 
ceptable fits: the grey dots representing the observed values 
versus the model-predicted quantiles form approximately a 
straight line, and no strong systematic deviations from the 



fit can be seen, though there is some scatter at the right end 
of the distribution. Another example of a general quantile- 
quantile plot is the right panel of Figure [T] using standard 
Gaussian quantiles instead of the GEV, corresponding to 
the supposed distribution of the sample. On that plot, non- 
Gaussianity can be detected as systematic deviation from 
the straight line. 

Whether the scatter of the right end of the distribu- 
tion in Figure [2] is just due to natural fluctuations or to 
an invalid model can be better judged by the return level 
plot. It shows the return level function ( p (the inverse of the 
fitted distribution) against transformed probability values 
log[— log(l — p)], that is, the points 

{log[-log(l-p)U P }, 

where the return level £ p is given by |5| for any p, using the 
fitted GEV parameters. The transformation of the proba- 
bilities is such that the Gumbel distribution would become 
a straight line, a heavy-tailed GEV would curve upwards 
above it, and a finite-tailed one would remain below it, 
monotonically increasing, but eventually tending to a hori- 
zontal line. 

The return level plots are complemented with confi- 
dence bands, which are indispensable in judging the quality 
of the fitted model. Adding the observed points 



log 



log 1 



in + 1 



Hi) 



1. 



they usually show some scatter around the line, especially 
the highest ones. The confidence bands help to see how far 
from the fitted line they are. Despite an eventual scatter, 
a model is still acceptable if the points remain within the 
confidence bands. 

As an illustration, the return level plots corresponding 
to the quantile-quantile plots of Figure [2] are given in the 
bottom row. They are constructed using the same collection 
of maxima and the same model fit. The solid line on both 
graphs represents the fit, namely the return levels £ p given 
by equation |5| against their transformed probabilities. The 
distribution plotted on the left-hand side is finite-tailed, the 
one on the right is very close to a Gumbel distribution. The 
confidence bands give an immediate visual impression about 
the uncertainty of the estimated return levels. In both cases, 
the fitted model is good, since all the points remain well 
within the confidence bands. 



2.4 Application of extreme-value theory to 
periodogram peaks 

The classical periodogram values of an independent, iden- 
tically distributed (iid) Gaussian noise evenly sampled at 
times St, 25t, . . . , TSt, computed at the Fourier frequencies 
fj — j/(TSt), j — 1, . . . , T72, are distributed independently 
as standard exponential variables (or equivalently, apart 
from a normalizing factor, as \2 variables). Consequently, 
the extreme-value limit for its maximum is the Gumbel 
distribution. When the iid sequence is non-Gaussian, the 
classical periodogram at the Fourier frequencies is neither 
independent nor uncorrelated, though asymptotically, the 
values at any set of frequencies form an iid standard ex- 
ponential vector when T — > oo (Brockwell & Davis 2006 



Proposition 10.3.2). Davis & Mikosch (19991 have proven 
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the Gumbel limit also for the maximum at the Fourier fre- 
quencies for the classical periodogram of non-Gaussian iid 
sequences, when the mean of the sequence is zero and the 



variance is finite. Lin & Liu ( 2009 1 generalizes this further 



to continuous transformations of linear processes, which sat- 
isfy several conditions on the tail decay and the moments of 
the underlying noise process, in addition to restrictions on 
the dependence structure and summability conditions on the 
linear process. These works allow to test the hypothesis of 
periodicity against more complex dependent non-Gaussian 
processes, if the observed sequence is regularly sampled, and 
the (classical) periodogram is computed at the Fourier fre- 
quencies. 

In the evenly sampled case summarized above, increas- 
ing the number T of observations leaving St fixed leads 
directly to the increase of the number of Fourier frequen- 
cies in a fixed frequency interval, and an asymptotic limit, 
where extreme- value theory works, follows naturally. When 
turning to oversampled periodograms of irregularly observed 
time series, the situation becomes more complex. We observe 
N <C T data points, so we have a majority of missing val- 
ues in the time domain, and we compute the periodogram 
on a frequency grid that is oversampled by a factor K with 
respect to the Fourier grid. The ensuing degeneracy (see Sec- 
tion 2.1 1 in the joint distribution of the periodogram makes 
the extreme-value limits non-trivial. As the existence of the 
limiting GEV distribution is linked to only an asymptoti- 
cally weakening dependence at extreme levels, a strong de- 
pendence in a finite case does not exclude the existence of 
such a limit. It can be assured if we find particular spec- 
ifications for the relative rates of T — > oo, N — > oo and 
n — > oo, so that dependency and degeneracy in the spec- 
trum decreases. Such a specification is given by K — > 1, 
T — >• oo, N — > oo such that N/T — > 1. In this case, the 
proportion of the missing values in time-domain decreases 
towards 0, and the refined frequency grid, as determined 
by the oversampling factor K decreasing to 1, tends to the 
Fourier grid. The limit is then the evenly sampled case with 
the Gumbel distribution. 

Hence, several facts motivate testing simple extreme- 
value methods for the assessment of significance of peri- 
odogram peaks: (1) the existence of a Gumbel limit for the 
classical periodogram maxima at the Fourier frequencies for 
a large class of time series under even sampling; (2) the pos- 
sibility to define time sampling and missing value patterns 
such that in a limit N — > oo we approach an evenly sampled 
case; (3) the general nature of the extreme-value family as 
the limiting distribution of almost all continuous distribu- 
tions under broad dependence conditions; and (4) this limit, 
if exists, can be estimated in a straightforward way for the 
null hypothesis Ho- Estimation or knowledge of the number 
of independent frequencies and stringent assumptions on the 
marginal distribution of the periodogram are not necessary 
anymore. 

For our procedure, we assume therefore the existence of 
a GEV limit for the maxima of periodograms. Moreover, we 
assume that in the finite cases we are dealing with, the GEV 
provides a sufficiently good approximation to the true dis- 
tribution of the maximum. The multitude of methods used 
in astronomy, each of which has its particular distribution 
of periodogram, and the degeneracy in the probability dis- 
tribution of the periodogram suggests the use of the general 



GEV family with £ £ R, instead of the Gumbel subfamily 
with £ = found for the classical periodograms. A further 
motivation is that any asymptotic theoretical distribution 
of the periodograms can be wrong in the high quantiles, 
and strongly deviate from the exponential tail. The gener- 
alization from £ = to £ £ R admits all continuous dis- 
tributions, not only the exponentially-tailed ones. However, 
as for all tail estimation, the estimates are very sensitive to 
these underlying assumptions. Thus, the quality of the GEV 
approximation must always be checked with the diagnostic 
plots presented in Section [2. 3| 



3 ESTIMATION OF THE FAP 
3.1 Procedure 

1. Bootstrap of the original time series. 

In order to generate noise sequences under Ho, we resample 
the original observations Xi,...,Xm with replacement 
and with equal probabilities (nonparametric bootstrap) R 
times, using the same observational epochs. Thus, we create 



(r) 



,x!j r J of a whir.' noise series with 



R repetitions Xj 

approximately the same marginal distribution. 

2. Maxima of partial periodograms. 

From the frequency grid T, we randomly select L fre- 
quency intervals of K consecutive frequencies, where K 
is the oversampling factor, and L is chosen large enough 
to provide a good extreme-value approximation. This is 
equivalent to a random draw of L central frequencies 

(r) (r) 

fj , . . . , / ■ with equal probabilities, and, around each, 
taking a frequency interval containing K consecutive grid 
frequencies (K is the oversampling factor with respect to 
the Fourier grid). The periodogram must be calculated only 
at these frequencies, and only the maximum of each partial 
periodogram is needed. The output of this step is thus a 
sample of R maxima of partial periodograms of white noise 
sequences, which have distributions similar to the original 
observations. 

3. GEV modelling of the partial maxima. 

Fit an extreme-value model G(z;£, a, fi) to the R maxima, 
maximizing the log-likelihood Q, to obtain estimates £, a 
and fl for its parameters, and compute the inverse of the 
observed information matrix for their uncertainty estimates. 
Use diagnostic plots to check the quality of the fit. 

4. Extrapolation for the complete periodogram. If 
the quality of the fit is sufficiently good, use the estimated 
G(z; f , <t, ft) to find levels corresponding to the desired FAP 
values. Give confidence intervals of these levels. 



3.2 Reasoning behind the steps 

1. Bootstrap. We propose here nonparametric bootstrap 
as an alternative to assume a specific parametric form for 
the noise. Fixing a parametric distribution for it is equiva- 
lent to fixing a specific subfamily of the GEV distribution, 
corresponding to either £ < (finite tail), £ = (expo- 
nential tail) or £ > (heavy tail). For example, assuming 
Gaussianity of the observations or a distribution for the 
periodogram implies at once an exponential tail with £ = 0. 
This is a very important restriction, with crucial impact on 
the very high quantiles necessary for FAP. Also, not only 
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the estimated parameters (and quantile levels) are biased in 
such a case, but the variance of the parameter estimates is 
too small. It does not reflect our (real) uncertainty about 
the tail of the observations or the periodogram, and gives 
a false measure of how sure we can be about an eventual 
detection of periodicity. 

In addition, to check the null hypothesis of the observed 
sequence being a white noise, we should admit the possi- 
bility of deviating from any presumed distribution. In the 
situation of Figure [l] it is clear that if this sequence is in- 
deed noise, then it cannot come from a Gaussian distribu- 
tion. Generating white noise from a Gaussian distribution 
would produce on average lower periodogram maxima than 
another, non-Gaussian distribution that is closer to the ob- 
served one, since the observed upper tail seems to be heavier 
than a Gaussian distribution. Thus, Gaussian simulations 
would overestimate the significance of the maximum of the 
periodogram of the observations. 

The nonparametric bootstrap is a flexible alternative to a 
parametric bootstrap, in order to avoid such restrictions on 
the tails. It corresponds to using the empirical distribution 
function F(x) = Ysi=i I( x > -^0 instead of the true un- 
known distribution F(x) to generate a noise sequence, i.e., 
an independent, identically distributed random sample of 
the distribution of the data. The parametric and the non- 
parametric bootstrap were compared in simulations; there 
was no perceptible difference in the estimated GEV param- 
eters for the two different types of simulations. 

2. Maxima of partial periodograms. Two different 
aims motivate the particular way of the procedure to se- 
lect "blocks", subsets from periodograms of which we take 
the maxima. The principal goal is to decrease the compu- 
tational loads due to a bootstrap. At the same time, the 
reduced frequency set must reflect the fundamental charac- 
teristics of a full periodogram: the long-range dependence 
manifested in the non-vanishing correlations between sinu- 
soids of two distant frequencies, and the short-range depen- 
dencies due to the spectral leakage. The random selection 
of central frequencies throughout the whole range (0, / max ] 
provides a sample which is representative of the long-range 
dependence, whereas taking intervals of width equal to the 
oversampling factor K accounts for the effects of spectral 
leakage. Maxima of such partial periodograms carry infor- 
mation on both kinds of dependency, and the GEV fit of 
step 3. will reflect these, but a careful assessment of model 
quality is required to enable extrapolation. 

3. GEV modelling. The fitting can be done in several 
ways, for example by the method of maximum likelihood 



Smith (19851; Coles (20011. The parameter estimates, if the 



true £ > —0.5, follow an asymptotic multivariate normal dis- 
tribution (similar to the usual asymptotics of regular likeli- 
hood estimates), so inference is straightforward for the esti- 
mated model. Due to the strong dependence and the degen- 
eracy present in the periodogram, and to check whether the 
number of bootstrap repetitions R and the size of the partial 
periodograms KL provide a sufficiently good extreme- value 
approximation, it is necessary to assess model quality by the 
diagnostic plots described in Section [2. 3| 

4. Extrapolation. The model can be used for extrapo- 
lation only if the diagnostics have proven the model ac- 
ceptable. This consists of finding quantile levels of the full 
periodogram, based on the extreme-value modelling of the 



partial periodogram. A level f a corresponding to FAP = a 
in the full periodogram is exceeded only once in 1/a com- 
plete periodograms (in numbers, FAP = 0.01 means that 
out of a hundred periodograms, we can expect on average 
only one where the maximum exceeds f a ). As the com- 
plete periodogram contains n/(KL) times more test fre- 
quencies than the subsets used for estimation, this corre- 
sponds to one exceedance in n/(aKL) partial periodograms. 
Using Eq. j5J, we can then compute the desired level as 
( a — G -1 (l — [aKL]/n), and can add also confidence inter- 
vals. 

Though using only the maxima of a reduced frequency set for 
the estimation of the GEV parameters alleviates the prob- 
lems due to the time-requirements of the bootstrap, the re- 
duction implies that we need to extrapolate in order to give 
return levels of maxima of the complete periodogram. If the 
used frequency set is by far smaller than the complete peri- 
odogram, the extrapolation must reach to levels far beyond 
those used in the fit, and the estimated return levels will 
have a large uncertainty. A frequency set size closer to the 
complete periodogram provides more reliable extrapolation. 
Thus, there is a trade-off between the computational load 
and the necessary range of extrapolation. We used simula- 
tions with various deterministic patterns, error distributions 
and signal-to-noise ratios to show that the estimates are re- 
markably stable down to frequency set size L — 100 and 
repetition number R around 200-400. The choice of R and 
L can also be checked by the above mentioned diagnostic 
plots: for bad model fits, increasing L is a possible remedy, 
that is, using larger partial periodograms, and if the ob- 
served points scatter too much around the fitted line, this 
may be improved by increasing R, that is, using a larger 
number of bootstrapped noise sequences. 



4 RESULTS 
4.1 Simulations 

The performance of the procedure was assessed using two 
light curve patterns, a sine- wave of the form g(t) = 
Asin.i sin(/ s i n t) and a broken-line model for detached eclips- 
ing binaries. For the sinusoid, the light curve parameters 
were / s j n = 3.379865 d _1 and three different amplitudes 
AinA = 0.15,Ain,2 = 0.05 and A sin , 3 = 0.025 mag. 
For the eclipsing binary, / cc i = 0.4243146 d~ 1 was used, 
with the depth of the primary minimum equal to A cc i ti — 
2,1.33,1,0.67,0.33 and 0.167 mag, and the ratio between 
the depth of the secondary and the primary minima fixed to 
0.375. 

To both variability patterns, we added random noise 
generated from Gaussian distributions with time-varying 
variance, yielding the model Yi = g(ti) + e« with ei ~ 
Af(0,af). The time -varying standard deviations Oi were 
themselves random, based on a random gamma-distributed 
variable at each point, yielding an average Oi of 0.05 mag 
(the approximate signal-to-noise ratios of the three sine- 
wave simulations were thus SNRsi n ,i = A S m,i/<J = 3, 1 and 
0.5). The deviations added to the sine-wave and the eclips- 
ing binary simulations at point i were then generated from 
the Gaussian A/"(0, of). 

Epochs of observations were chosen on a time grid of 
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Figure 2. Upper panels: quantile-quantile plots for two GEV fits 
for the maxima of 1000 noise sequences generated by bootstrap, 
for 25 and for 100 night-time observations (left and right panel, 
respectively). In both cases, the bootstrapped original time series 
was the sinusoidal simulation with SNR = 1, and L = 100 ran- 
dom frequency intervals were used. Bottom row: the return level 
plots for the same data as on the panels above. Dots correspond 
to the observed values against the transformed empirical proba- 
bilities log ^— log ^1 — ^jpj- jj , solid lines to the fitted model, and 
dashed lines to a 95% confidence interval for the fitted return level 
curve based on asymptotic normality of the maximum likelihood 
estimator. 



0.005 day (about 10 minutes) with a total span of 25 days. 
Imitating random nighttime observations during 4 hours 
each night, two different sequences of epochs were randomly 
uniformly selected from 25 nightly 4-hour periods. One con- 
sisted of N = 100 observations, the other, of N = 25. The 
input data for the period search were the two sequences of 
epochs, the nine noisy light curves each sampled at both ca- 
dences as observed values and the random gamma variables 
as error bars, yielding in total 18 different time series. 

The time grid parameters led to an upper frequency 
detection limit / max = 100d -1 , a Fourier frequency set of 
Tf = {0.04, 0.08, 100}d _1 and a corresponding peak 
width of 0.04 d -1 due to leakage. An oversampling fac- 
tor K = 16 was used, providing a test frequency grid 
J" = {0.0025,0.005, . . . , 100} cT 1 with n = 40000 test fre- 
quencies. The generalized Lomb-Scargle method was per- 
formed on all 18 time series using these test frequencies, once 
without weighting, and once with weights defined by the nor- 
malized inverse squared error bars. The grey spikes in Figure 
[5] show the resulting periodograms from the non-weighted 
version for the six sinusoidal light curves with Gaussian er- 
rors. 

The procedure presented in Section[3]was performed on 
the 18 light curves. In order to check the stability and the 
variance of the estimates as a function of the number of boot- 
strap repetitions R and the number L of the test frequency 



intervals, we applied all pairwise combinations of L = 
50,100,200,300,400,500 and R = 200,400,600,800,1000 
for each of the 18 simulated light curves. We calculated also 
the complete periodogram for 2000 independent bootstrap 
noise sequences for each of the simulated light curves, in 
order to compare the model-based and the empirical high 
quantiles, corresponding to selected FAP levels. 



4.2 Model fits and diagnostics 

Each combination of number of repetitions and number 
of random frequency intervals (R, L) on each simulation 
yielded a sample of R maxima of a partial periodogram of 
size KL, which were then fitted with the GEV model. The 
quality of all models was checked by the return level and 
quantile-quantile plots described in Section [2. 2| In general, 
the models are acceptable and can be used for extrapola- 
tion, though some small deviations could be found in many 
plots, as shown in Figure [2] The model quality on average 
is somewhat worse for the shorter time series with N — 25 
than for time series with N = 100, but no other systematic 
difference can be seen with respect to R or L. The high end 
of the quantile-quantile plots often deviates slightly down- 
ward, implying that the observed periodogram maxima are 
stochastically smaller than the model estimates. The same 
effect can be remarked also on the return level plots: the 
theoretical curve is above the points corresponding to the 
observations. This leads to a conservative error in the sig- 
nificance assessment: the model-based estimated quantiles 
will be a little higher, and an observed peak will be found 
somewhat less significant. The converse error is very rare, 
which implies a lower risk of false periodicity identifications. 



4.3 Plausibility of quantile estimates and stability 

After the inspection of the diagnostic plots, all GEV mod- 
els obtained for the 18 simulations with all (R, L) combi- 
nations were used to estimate quantile levels f a belonging 
to fixed FAP levels a = 0.01 and 0.005. In order to check 
their plausibility, we created 2000 bootstrapped white noise 
sequences from each of the 18 simulated time series. The 
complete periodograms of every repetitions were computed, 
and the maxima of these selected. For a FAP equal to a, the 
proportion a of the peaks exceeding £ Q was calculated, and 
compared to a. 

Figures [3] and [4] show the results for the six sine- wave 
simulations with Gaussian noise, 1 — a as a function of L for 
the short and the long time series, respectively. The agree- 
ment between the theoretical 1 — a and the empirical 1 — & 
is visibly much better for the non-weighted method than 
for the weighted for the short time series. A possible expla- 
nation is that though weighting for least squares regression 
improves on the estimates only when the true model is fitted, 
namely, a sine-wave signal plus Gaussian noise with known 
variances, under Ho we fit only a constant plus , and weight- 
ing with the inverse variances loses its optimality property. 
An improvement with decreasing signal-to-noise ratio, when 
the deviations from normality become smaller, is visible in 
Figure [3] which supports this explanation. 

Small test frequency set sizes (L = 50 or 100) cause 
instability of the empirical exceedance proportions 1 — a, 
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Figure 3. The proportion of full pcriodogram maxima in 2000 that exceed the estimated high quantiles for FAP = 0.01 (top row) and 
FAP = 0.005 (bottom row), as a function of L, for time series length N = 25, with R = 1000. Dashed grey line with triangles show the 
results using non-weighted generalized Lomb-Scargle, solid black lines with dots is using the weighted version. The dotted lines denote 
the FAP levels. The left panels refer to SNR= 3, the middle panels, to SNR= 1, and the right panels, to SNR= 0.5. 
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Figure 4. The proportion of full pcriodogram maxima in 2000 that exceed the estimated high quantiles as a function of L, for time 
series length N = 100. The symbols and the SNR- FAP combinations are the same as in Figure [3] 



especially for the short time series. With N — 25 and for 
the weighted period search version, L > 200 seems necessary. 
Above this, a is stable, and approximates well the FAP level 
a even in the case of short time series. For the longer time 
series with N = 100, the stability is remarkable, and the 
approximation is in general good. 

The plausibility of the estimates can be easily seen if 
they are plotted against the periodograms of the time se- 
ries. In Figure [5] the decrease of the signal below detection 
level can be clearly observed, as the SNR decreases. The left 
panels show the time series of the sinusoidal light curve with 
Gaussian noise with N = 100 observations, the right panels 
the same light curve-noise combinations with N = 25. The 
levels predicted by the procedure are presented as horizontal 
lines with different line types for different L values. These 
levels reflect well the judgment based on the aspect of the 
whole periodogram. In the cases of SNR = 3 and 1 with 
N — 100 observations, the presence of a signal is obvious 



because of the absence of other comparable peaks, and ac- 
cordingly, the plotted FAP = 0.01 levels pass well below the 
peaks. In the case of the weakest signal with N — 100, the 
periodogram exhibits another group of peaks of compara- 
ble height beside the correct one. The signal is undetectable 
based on the noisy data. The quantiles, in agreement with 
the impression of non-significance, appear here well above 
the peaks, and indicates that neither of the peaks is signifi- 
cant. 

With less data (N = 25), the same plausibility can be 
observed. So scarce sampling makes even a signal of SNR 
= 3 barely detectable. A careful judgment says that there is 
very likely a signal, but the maximum of the periodogram 
does not exceed very much the secondary peak, and there 
may be a weak probability that a noise sequence produced 
this periodogram. Accordingly, the estimated quantiles for 
FAP = 0.01 hover around the highest peak: its statistical 
significance is around 0.01, that is, there is ~ 1% probability 
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Figure 5. Stability of the estimated 0.99 quantile as a function of the number of test frequencies for the sinusoidal simulations with 
Gaussian errors, TV = 100 (left column) and N = 25 (right column). The grey spikes show the periodograms of the simulated noisy sinewave 
signals, the horizontal lines are the quantile levels estimated from fitted GEV models. The different types of the lines correspond to 
different numbers of test frequency intervals L, and are the same for all panels, shown in the legend in the top left panel. For all plots, 
R = 1000 and the period search method is the non-weighted generalized Lomb-Scargle. 
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Figure 6. 95% confidence intervals of the estimated 0.99 quantile for the SNR = 1 case of the sinusoidal simulations with Gaussian 
errors. The plots are enlarged versions of the two panels of the middle row of Figure [5] For visibility, quantiles for only two values of L 
are plotted (L = 100 with grey, L = 500 with black); the thick solid lines correspond to the quantile estimates, the dashed ones to 95% 
confidence intervals from bootstrap. 



that a noise sequence produces such a peak. For the two 
other cases with smaller SNR, the signal is not detectable: 
for SNR = 1, there are many other peaks with comparable 
size, and for SNR= 0.5, the signal is lost, and the spectrum 
is dominated by peaks solely due to noise. 

The plausibility for the eclipsing binary-like time se- 
ries is very similar, with the only difference that the period 
search method finds the double of the frequency, not the cor- 
rect frequency. The generalized Lomb-Scargle method does 
so very often for eclipsing binaries, due to their two more 



or less equally-spaced minima in the light curve. The signifi- 
cance of a peak present in the periodogram is assessed in the 
same way and with the same plausibility of the results for 
the double frequency. When applied to real data, a further 
step of plotting the folded light curve can decide whether 
the analysed star is an eclipsing binary or not, and whether 
the correct or the double frequency was found. 

The stability with respect to the number of test fre- 
quency intervals L can be observed in Figure [5] too. In the 
left panels with N = 100, all estimated quantile levels cor- 
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Figure 7. Stability of the estimated quantile corresponding to FAP = 0.01 as a function of the number of bootstrap repetitions, for 
number of test frequency intervals L = 100 (black) and 500 (grey). 95% confidence levels based on nonparametric bootstrap are plotted 
as vertical bars. The plotting positions are slightly shifted horizontally only in order to avoid overlapping marks. 



responding to different numbers L of test frequency inter- 
vals are almost indistinguishably close together. Due to the 
scarce data, the estimated levels are more scattered in the 
right panels showing the TV = 25 cases, but the agreement 
is still quite good, and conveys reasonable judgments about 
the significances. The agreement between estimates with dif- 
ferent L values can be even better appreciated in Figure [5] 
This shows an enlarged picture of the estimated quantiles of 
the simulation with SNR = 1 and R = 1000 (middle pan- 
els of Figure [5| , together with bootstrap-based confidence 
intervals, offering a better opportunity to judge their stabil- 
ity. For the sake of visibility, only L = 100 and L — 500 are 
plotted. The overlap of confidence intervals confirms the sta- 
bility of the quantile estimates in a broad range of frequency 
set sizes. 



4.4 Stability with respect to R 

The weak dependence of the estimated high quantiles on 
the number R of bootstrap repetitions can be seen in Fig- 
ure [7| The upper row shows the estimated 0.99-quantiles 
for the complete periodogram of the sinusoidal signals with 
Gaussian noise with 100 observations, the bottom row shows 
them for TV = 25. In the former case, R as low as 200 can 
be combined with a number L of test frequency intervals 
as low as 100 giving stable and reliable estimates, though 
the confidence intervals (here, those originating from boot- 
strap) are larger with such small values than with L = 500 
or R = 1000. For a scarcely sampled time series, the esti- 
mates are more varying according to L or R, but over the 
whole range, they agree with each other within the confi- 
dence intervals. The stability of the estimates with respect 
to R and L allows a strong reduction of computational time, 
preserving at the same time the quality of the significance 
assessment. 



5 CANDIDATE MULTI-MODE RR LYRAE 
STARS FROM SDSS 

5.1 Data 

We give two examples of the use of the proposed procedure, 
namely, two variabl e stars from the Slo an Digital Sky Survey 
(SDSS) Stripe 82^ ( |lvezic et all|2007} . The SDSS provides 
five-band (u, g, r, i and z) photometry of more than 11000 
deg 2 of the sky. A region along the celestial equator, called 
Stripe 82, was imaged repeatedly during a 10 year long pe- 
riod, providing 5 simultaneous time series of up to ~ 100 



data points per object until Data Release 7 (Abazajian et 
aX1|2009 1. RR Lyrae and high-amplitude 5 Scuti stars were 



identified in this data set by Sesar et al. (2010 \ and Siiveges 



et al. (2012|. The classification using principal component 



analysis presented in the latter yielded a number of candi- 
date RR Lyrae stars. Among these, there were many rejected 
because of a noisy light curve or a nonsignificant primary 
frequency. We reconsider two of these stars here, a double- 
mode candidate and one with a surprisingly high secondary 
frequency. 

All the periodograms were computed by the non- 
weighted generalized Lomb-Scargle procedure between 
[0, 6] dT 1 for the double-mode candidate or [0, 40] dT 1 for the 
secondary periodogram for the high-frequency mode candi- 
date, using a resolution of 0.0001d _1 . This grid choice gave 
for both stars an oversampling factor K « 12, which was 
checked by plotting the spectral window. Candidate peri- 
odic signals were removed from a time series of either ob- 
served magnitudes or residuals by fitting cyclic cubic splines 
to the folded light curve, because this is able to remove all 
harmonics of that frequency simultaneously at the expense 
of a milder decrease of degrees of freedom than a Fourier 



1 http:/ /www. astro. washington.edu/uscrs/ivezic/sdss/catalogs/ 
S82 variables .html 
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Figure 8. Spectral window (top left), enlarged spectral window around (top right), periodograms of g-band observations and of residual 
time series after two successive pre-whitening (left panels, second to fourth row) and the corresponding folded light curves (right panels, 
second to fourth row) for star 538812 (J231332. 19-010746. 2). On the periodograms, the FAP levels of 0.01 are plotted as solid black lines, 
together with their 95% bootstrap confidence intervals. 



harmonic series. A range of smoothing parameters was tried 
for all light curves, among them one based on generalized 
cross-validation (GCV, see e.g. Ruppert et al.|2003 |. In ma- 
jority, the GCV criterion produced a reasonable smooth light 
curve. In the case of overfitting, the smoothing parameter 
was adjusted manually. The procedure proposed in Section 
[3] was performed for all observed and pre-whitened light 
curves, using R — 500, combined with L — 200 or L = 800 
for periodograms with [0,6] or [0,40] d _1 , respectively. Re- 
turn levels for the complete spectrum, corresponding to FAP 
= 0.05 and 0.01, that is, the 0.95- and 0.99-quantiles were 
then calculated from a GEV model fitted to the 500 boot- 
strap periodogram maxima, and the decision about the sig- 
nificance of the periodicity was obtained by comparing the 
peak in the observed sequence to the estimated level. 



5.2 A double- mode candidate 

The spectral window and primary and residual (/-band pe- 
riodograms of the star 538812 (J231332. 19-010746. 2) are 
shown in the left panels of Figure [8] The spectral window 
exhibits slowly decaying daily alias peaks. Its enlarged ver- 
sion in the top right panel shows in addition strong yearly 
aliasing and an oversampling factor K = 13. The star has 
a weakly significant (0.01 < FAP < 0.05) primary peak at 
the frequency 2.755272 dT 1 in g. Period search on the other 
bands supports the existence of a signal at this period: the 
periodogram maximum in u falls at a yearly alias of this 
frequency, and there is a prominent peak here in the other 
three bands as well, though those are not maxima. There are 
apparently several other periodic signals in the other bands, 
suggesting a multi-periodic nature. Pre-whitening all bands 
with the spline smoother yields a very significant peak in the 
residual spectrum at 2.05107 d~ , providing the fundamen- 
tal frequency fo of the star. The proportion between the two 
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frequencies is 0.7444, which corresponds perfectly to the Pe- 
tersen diagram of double-mode RR Lyrae stars. Performing 
pre-whitening with the same technique on the other bands 
yields similarly significant periodicities in r and i, but no 
evident signal in either u or z, probably due to the higher 
noise levels in these bands and to the faintness (~20 mag in 
g) of the star. The removal of the second cycle does not yield 
any further significant new frequency, apart from an alias of 
/o + /i, and since the residual degrees of freedom decrease 
by a value around 10 or more with each pre-whitening, the 
data set size (N — 56) does not allow any further meaningful 
investigations. 

The very weak significance of the primary peak is due 
to the strong secondary frequency, which causes large resid- 
ual noise in the folded light curve, and hence a small relative 
decrease of x 2 in the primary periodogram. The found fre- 
quencies, the position of this star on the Petersen diagram 
and on the (u — g, g — r) and period-amplitude diagram, 
together with the results of a principal component analysis 



similar to Siiveges et al. (20121, confirms its double-mode 



nature despite the only weakly significant frequency in the 
primary spectrum. 



5.3 A candidate with a high-frequency secondary 
mode 

Star 4477012 (J203120. 88-001125. 3) al s o was selected as an 
RR Lyrae candidate in Siiveges et al.| ( |2012[ ). Its main fre- 
quency found there (/o = 2.660207 d~ ), its amplitude, its 
principal components characteristics and the strong varia- 
tion in the folded g — i colour light curve suggest an RR 
Lyrae-like pulsating nature. The variable is on the bluest 
border of the RRab region of the (u — g,g — r) diagram, 
slightly off from both RRc and RRab regions on the colour- 
log(period) diagram, and far from the location of both sub- 
types on the period-amplitude diagram. A double-mode na- 
ture would put the star to an admissible position on the 
colour-log(period) plot, but this is excluded by the analysis 
of the residual periodogram of the r-band observations pre- 
sented in the second row of Figure [9] there is no indication of 
a correct secondary frequency. Instead, we find a weak peak 
at a high frequency, /i = 13.77038 dT 1 . It attains the level 
corresponding to FAP = 0.01 only in r, but the maxima of 
the residual periodograms in all other bands falls exactly at 
the same frequency. This simultaneous occurrence supports 
that the found periodicity is a real oscillation, either of the 
star, or of some external origin. 

The right panel of the second row of Figure [9] shows 
that in the folded residual curve the largest and the small- 
est residual fall exactly at the minimum and the maximum 
of the fitted sinusoid. This is so in all bands. The found fre- 
quency is thus determined by the separation of these two 
extremal observations. When omitting them, the primary 
frequency, now at /o = 3.660207 d _1 , becomes far more sig- 
nificant, as illustrated in the left panel in the third row of 
Figure [9] From the residual periodogram, presented in the 
bottom row on the left, any significant peaks disappear, and 
the five filters show no longer marked coinciding patterns. 

The decision, whether the signal of /i = 13.77038 dT 1 
exists or not, cannot be taken on purely statistical grounds. 
It depends on the decision whether the two influential obser- 
vations are true or erroneous, and whether we find it plau- 




15.0 15.5 
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Figure 10. Multivariate distribution of the 5-band observations 
in the u-, g- and r-bands (top) and in the r-, i- and z-band (bot- 
tom). The larger light grey blobs on both plots represent the ob- 
servations with crucial influence on the secondary periodogram in 
Figure [9] 



sible that only two observations out of 44 carry most of 
the information on an existing oscillation. These two obser- 
vations are of good quality, are not outliers in any of the 
bands, and their error bars in all bands are rather small. 



Moreover, in Figure 10 which shows the observed (w, g, r) 
and (r, i, z) magnitudes of the observations in 3 dimensions 
plotted against each other, the light-coloured larger dots rep- 
resenting these points fit perfectly into the joint multivari- 
ate distribution of the remaining data. Hence, rejecting them 
seems unreasonable. On the other hand, accepting them and 
thus accepting the existence of a secondary period of such 
a high frequency leads to difficulties in the interpretation of 
the frequencies and in the class determination for the star. 
The confirmation or the rejection of this frequency could be 
obtained only by more data on this object. 



6 DISCUSSION 

This paper discusses the difficulties with the most commonly 
used method of astronomy to estimate the False Alarm 
Probability in periodograms, the formula F(z) . The pro- 
posed procedure is intended to avoid its shortcomings: the 
lack of interpretation of M, the high sensitivity of the for- 
mula both to the tail of F and to M, the invalid assumptions 
in its derivation, the instability of the formula due to its de- 
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Figure 9. Periodograms of r-band observations and of residual time series after one pre-whitening (left panels) and the corresponding 
folded light curves (right panels) of star 4477012 (J203120. 88-001125. 3). The top two rows show the periodograms and the folded light 
curves including the two extremal observations, the bottom rows show the same plots when omitting these from the analysis. On the 
periodograms, the FAP levels of 0.01 are plotted as solid black lines, together with their 95% bootstrap confidence intervals. 



generacy when M increases, and the lack of inference for the 
estimated FAP or quantile levels. 

The proposed procedure combines a bootstrap of the 
original time series with the use of the GEV distribution 
instead of F. The bootstrap is used to produce white noise 
samples with similar empirical marginal behaviour as the 
observations, corresponding to the null hypothesis of no pe- 
riodic signal in the data without imposing additional distri- 
butional assumptions. For these bootstrap repetitions, par- 
tial periodograms are computed, and the GEV is used to 
estimate the distribution of their maxima. The GEV, on 
the other hand, is a well-motivated, generally valid distri- 
bution to describe probability levels of maxima from almost 
all continuous distributions. These two key elements of the 
procedure lift the dependence of the FAP on eventual mis- 
specification of the periodogram distribution F. 

The formula F(z) M is heavily impacted also by the 
effective number of independent frequencies, M. Small 
changes in its value change strongly the estimated quantiles 



or probabilities. There are no clear theoretical arguments 
that could help to compute its value for a given time series, 
or to assess whether an estimated value is reasonable or not. 
The proposed procedure avoids such issues. Moreover, stan- 
dard asymptotic normal theory can be used to give inference 
about the fitted model, and to obtain confidence intervals 
for the parameters of the GEV and for return levels, critical 
levels corresponding to fixed FAP values. 

The use of the GEV also helps to relieve the compu- 
tational loads due to the bootstrap, since it implies that 
estimates can be based on maxima of partial periodograms, 
computed only at a subset of frequencies. The quality of the 
fitted GEV model can easily be checked by diagnostic plots 
that are commonly used in all applications of extreme- value 
statistics, yielding information whether the model is good 
enough to use for extrapolation. This model diagnostic is 
imperative to do also because the astronomical periodogram 
is degenerate. Limiting theorems of extreme- value statistics 
give only sufficient conditions that do not treat this case, and 
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though necessary conditions are likely to hold, this must be 
checked in all possible ways. 

The subset of frequencies at which we calculate the par- 
tial periodograms is selected in a specific way. It contains 
enough frequencies such that extreme-value theory, which 
is based on asymptotic arguments concerning maxima of a 
huge number of variables, can be expected to provide a good 
approximation. Intervals of contiguous frequencies are ran- 
domly selected across the tested frequency range (0, /max], 
each of length at least equal to the oversampling factor. This 
ensures that they sample the most important sorts of de- 
pendencies in the periodogram: the long-range correlations 
across the whole frequency range, and the extremely strong 
local dependence due to spectral leakage. 

The results on simulations show that the provided FAP 
levels are reasonable, and convey reliable information on the 
plausibility of the null hypothesis. The confidence intervals 
on the critical levels provide further useful information on 
the uncertainty of the decision. The estimated quantile lev- 
els are remarkably stable both with respect to the number of 
bootstrap repetitions and the size of the frequency subset, 
though a somewhat higher number of bootstrap repetitions 
provides better results for short (TV = 25) time series, and 
larger numbers produce of course less uncertainty of the esti- 
mates. A setup that provides good estimates and reasonably 
narrow confidence intervals on the quantile levels, but com- 
putationally is not too heavy, takes only a few times the time 
of the calculation of the complete periodogram (less than 10 
times both in the simulations and the data examples). The 
procedure was also applied to two variable stars with un- 
known class from SDSS Stripe 82, which had marginally sig- 
nificant primary periods in the RR Lyrae range. One of these 
is proved to be a double-mode RR Lyrae, by the emergence 
of a highly significant secondary peak at a frequency that 
corresponds to the Petersen diagram of the double-mode RR 
Lyrae stars. The other star remains an unclear case. A barely 
significant secondary period was found at a high frequency. 
The weak significance is due to the fact that this period 
is mainly determined by only two observations. However, 
these observations are very unlikely to be erroneous mea- 
surements, so the presence or absence of this high-frequency 
variation in this star cannot be decided purely on statistical 
grounds. 

The new procedure, because of its stability, well- 
founded statistical background and broad applicability, is 
therefore an excellent alternative to obtain significance of 
periodicities detected in astronomical periodograms. 
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